Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
نویسندگان
چکیده
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections. key words: conditional random fields, phoneme-to-phoneme confusion learning, re-ranking, spoken term detection, triphone detection
منابع مشابه
Combination of DTW-based and CRF-based Spoken Term Detection on the NTCIR-11 SpokenQuery&Doc SQ-STD Subtask
Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based combination (re-ranking) approach, which recomputes detection scores produced by a phonemebased dynamic time warping (DTW) STD approach. In the ...
متن کاملAn IWAPU STD System for OOV Query Terms and Spoken Queries
We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models[1][2]. In this paper, we describe two methods for text OOV query terms and spoken queries. For text OOV query terms, we introduce four unique methods. First...
متن کاملGraph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages
Acoustic feature similarity between search results has been shown to be very helpful for the task of spoken term detection (STD). A graph-based re-ranking approach for STD has been proposed based on the concept that search results, which are acoustically similar to other results with higher confidence scores, should have higher scores themselves. In this approach, the similarity between all sea...
متن کاملSpoken question answering using tree-structured conditional random fields and two-layer random walk
In this paper, we consider a spoken question answering (QA) task, in which the questions are in form of speech, while the knowledge source for answers are the webpages (in text) over the Internet to be accessed by an information retrieval engine, and we mainly focus on query formulation and re-ranking part. Because the recognition results for the spoken questions are less reliable, we use N-bes...
متن کاملIntensive acoustic models constructed by integrating low-occurrence models for spoken term detection
Triphone acoustic models are often used as subword models for detecting out-of-vocabulary query terms in Spoken Term Detection (STD) systems. Our preliminary experiments revealed that the training data for a large portion of the approximately 8,000 triphone models are insufficient. Assuming that such insufficient models deteriorate the performance of STD, this paper proposes intensive triphone ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 99-D شماره
صفحات -
تاریخ انتشار 2016